LIST OF EXERCISES
=================
1. Design MapReduce technique for word counting using python on Hadoop cluster

2. Develop MapReduce algorithm for finding the coolest year from the available Weather
data using java program on Hadoop cluster

3. Design a bloom filter to remove the duplicate users from the Log file and analyse the
filter with different cases.

4. Implement the Flajolet-Martin algorithm to extract the distinct twitter users from the
twitter data set.

5. Demonstrate the significance Page rank algorithm in the Hadoop platform with available
data set using MapReduce based Matrix vector multiplication algorithm.

6. Design a friend of friend’s network using Girvan Newman algorithms from the social
network data.

7. Demonstrate the relational algebra operations such as sort, group, join, project, and filter
using Hive and Pig.

8. Load the unstructured data into the Hadoop and convert it into the structured data using
Hive. Develop a Hive and HBase Databases, Tables, Views, Functions and Indexes and
perform the some perform basic query operations.

9. Implement a Pig Latin scripts to sort, group, join, project, and filter your data.

10. Implement the collaborative filtering system using PySpark

11. Perform the Logistic regression classification, SVM and Decision tree classifier
algorithms using PySpark and display the result with graph and compare the accuracy of
an algorithms using Precision, Recall and F-Measure.

12. Implement the KMean clustering algorithm using PySpark.